<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.21.2: https://docutils.sourceforge.io/" />
<title>Xapian Synonym Support</title>
<style type="text/css">

/*
:Author: David Goodger (goodger@python.org)
:Id: $Id: html4css1.css 9511 2024-01-13 09:50:07Z milde $
:Copyright: This stylesheet has been placed in the public domain.

Default cascading style sheet for the HTML output of Docutils.
Despite the name, some widely supported CSS2 features are used.

See https://docutils.sourceforge.io/docs/howto/html-stylesheets.html for how to
customize this style sheet.
*/

/* used to remove borders from tables and images */
.borderless, table.borderless td, table.borderless th {
  border: 0 }

table.borderless td, table.borderless th {
  /* Override padding for "table.docutils td" with "! important".
     The right padding separates the table cells. */
  padding: 0 0.5em 0 0 ! important }

.first {
  /* Override more specific margin styles with "! important". */
  margin-top: 0 ! important }

.last, .with-subtitle {
  margin-bottom: 0 ! important }

.hidden {
  display: none }

.subscript {
  vertical-align: sub;
  font-size: smaller }

.superscript {
  vertical-align: super;
  font-size: smaller }

a.toc-backref {
  text-decoration: none ;
  color: black }

blockquote.epigraph {
  margin: 2em 5em ; }

dl.docutils dd {
  margin-bottom: 0.5em }

object[type="image/svg+xml"], object[type="application/x-shockwave-flash"] {
  overflow: hidden;
}

/* Uncomment (and remove this text!) to get bold-faced definition list terms
dl.docutils dt {
  font-weight: bold }
*/

div.abstract {
  margin: 2em 5em }

div.abstract p.topic-title {
  font-weight: bold ;
  text-align: center }

div.admonition, div.attention, div.caution, div.danger, div.error,
div.hint, div.important, div.note, div.tip, div.warning {
  margin: 2em ;
  border: medium outset ;
  padding: 1em }

div.admonition p.admonition-title, div.hint p.admonition-title,
div.important p.admonition-title, div.note p.admonition-title,
div.tip p.admonition-title {
  font-weight: bold ;
  font-family: sans-serif }

div.attention p.admonition-title, div.caution p.admonition-title,
div.danger p.admonition-title, div.error p.admonition-title,
div.warning p.admonition-title, .code .error {
  color: red ;
  font-weight: bold ;
  font-family: sans-serif }

/* Uncomment (and remove this text!) to get reduced vertical space in
   compound paragraphs.
div.compound .compound-first, div.compound .compound-middle {
  margin-bottom: 0.5em }

div.compound .compound-last, div.compound .compound-middle {
  margin-top: 0.5em }
*/

div.dedication {
  margin: 2em 5em ;
  text-align: center ;
  font-style: italic }

div.dedication p.topic-title {
  font-weight: bold ;
  font-style: normal }

div.figure {
  margin-left: 2em ;
  margin-right: 2em }

div.footer, div.header {
  clear: both;
  font-size: smaller }

div.line-block {
  display: block ;
  margin-top: 1em ;
  margin-bottom: 1em }

div.line-block div.line-block {
  margin-top: 0 ;
  margin-bottom: 0 ;
  margin-left: 1.5em }

div.sidebar {
  margin: 0 0 0.5em 1em ;
  border: medium outset ;
  padding: 1em ;
  background-color: #ffffee ;
  width: 40% ;
  float: right ;
  clear: right }

div.sidebar p.rubric {
  font-family: sans-serif ;
  font-size: medium }

div.system-messages {
  margin: 5em }

div.system-messages h1 {
  color: red }

div.system-message {
  border: medium outset ;
  padding: 1em }

div.system-message p.system-message-title {
  color: red ;
  font-weight: bold }

div.topic {
  margin: 2em }

h1.section-subtitle, h2.section-subtitle, h3.section-subtitle,
h4.section-subtitle, h5.section-subtitle, h6.section-subtitle {
  margin-top: 0.4em }

h1.title {
  text-align: center }

h2.subtitle {
  text-align: center }

hr.docutils {
  width: 75% }

img.align-left, .figure.align-left, object.align-left, table.align-left {
  clear: left ;
  float: left ;
  margin-right: 1em }

img.align-right, .figure.align-right, object.align-right, table.align-right {
  clear: right ;
  float: right ;
  margin-left: 1em }

img.align-center, .figure.align-center, object.align-center {
  display: block;
  margin-left: auto;
  margin-right: auto;
}

table.align-center {
  margin-left: auto;
  margin-right: auto;
}

.align-left {
  text-align: left }

.align-center {
  clear: both ;
  text-align: center }

.align-right {
  text-align: right }

/* reset inner alignment in figures */
div.align-right {
  text-align: inherit }

/* div.align-center * { */
/*   text-align: left } */

.align-top    {
  vertical-align: top }

.align-middle {
  vertical-align: middle }

.align-bottom {
  vertical-align: bottom }

ol.simple, ul.simple {
  margin-bottom: 1em }

ol.arabic {
  list-style: decimal }

ol.loweralpha {
  list-style: lower-alpha }

ol.upperalpha {
  list-style: upper-alpha }

ol.lowerroman {
  list-style: lower-roman }

ol.upperroman {
  list-style: upper-roman }

p.attribution {
  text-align: right ;
  margin-left: 50% }

p.caption {
  font-style: italic }

p.credits {
  font-style: italic ;
  font-size: smaller }

p.label {
  white-space: nowrap }

p.rubric {
  font-weight: bold ;
  font-size: larger ;
  color: maroon ;
  text-align: center }

p.sidebar-title {
  font-family: sans-serif ;
  font-weight: bold ;
  font-size: larger }

p.sidebar-subtitle {
  font-family: sans-serif ;
  font-weight: bold }

p.topic-title {
  font-weight: bold }

pre.address {
  margin-bottom: 0 ;
  margin-top: 0 ;
  font: inherit }

pre.literal-block, pre.doctest-block, pre.math, pre.code {
  margin-left: 2em ;
  margin-right: 2em }

pre.code .ln { color: gray; } /* line numbers */
pre.code, code { background-color: #eeeeee }
pre.code .comment, code .comment { color: #5C6576 }
pre.code .keyword, code .keyword { color: #3B0D06; font-weight: bold }
pre.code .literal.string, code .literal.string { color: #0C5404 }
pre.code .name.builtin, code .name.builtin { color: #352B84 }
pre.code .deleted, code .deleted { background-color: #DEB0A1}
pre.code .inserted, code .inserted { background-color: #A3D289}

span.classifier {
  font-family: sans-serif ;
  font-style: oblique }

span.classifier-delimiter {
  font-family: sans-serif ;
  font-weight: bold }

span.interpreted {
  font-family: sans-serif }

span.option {
  white-space: nowrap }

span.pre {
  white-space: pre }

span.problematic, pre.problematic {
  color: red }

span.section-subtitle {
  /* font-size relative to parent (h1..h6 element) */
  font-size: 80% }

table.citation {
  border-left: solid 1px gray;
  margin-left: 1px }

table.docinfo {
  margin: 2em 4em }

table.docutils {
  margin-top: 0.5em ;
  margin-bottom: 0.5em }

table.footnote {
  border-left: solid 1px black;
  margin-left: 1px }

table.docutils td, table.docutils th,
table.docinfo td, table.docinfo th {
  padding-left: 0.5em ;
  padding-right: 0.5em ;
  vertical-align: top }

table.docutils th.field-name, table.docinfo th.docinfo-name {
  font-weight: bold ;
  text-align: left ;
  white-space: nowrap ;
  padding-left: 0 }

/* "booktabs" style (no vertical lines) */
table.docutils.booktabs {
  border: 0px;
  border-top: 2px solid;
  border-bottom: 2px solid;
  border-collapse: collapse;
}
table.docutils.booktabs * {
  border: 0px;
}
table.docutils.booktabs th {
  border-bottom: thin solid;
  text-align: left;
}

h1 tt.docutils, h2 tt.docutils, h3 tt.docutils,
h4 tt.docutils, h5 tt.docutils, h6 tt.docutils {
  font-size: 100% }

ul.auto-toc {
  list-style-type: none }

</style>
</head>
<body>
<div class="document" id="xapian-synonym-support">
<h1 class="title">Xapian Synonym Support</h1>

<!-- Copyright (C) 2007,2008,2011 Olly Betts -->
<div class="contents topic" id="table-of-contents">
<p class="topic-title"><a class="reference internal" href="#top">Table of contents</a></p>
<ul class="simple">
<li><a class="reference internal" href="#introduction" id="toc-entry-1">Introduction</a></li>
<li><a class="reference internal" href="#model" id="toc-entry-2">Model</a></li>
<li><a class="reference internal" href="#queryparser-integration" id="toc-entry-3">QueryParser Integration</a></li>
<li><a class="reference internal" href="#current-limitations" id="toc-entry-4">Current Limitations</a><ul>
<li><a class="reference internal" href="#explicit-multi-word-synonyms" id="toc-entry-5">Explicit multi-word synonyms</a></li>
<li><a class="reference internal" href="#backend-support" id="toc-entry-6">Backend Support</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="introduction">
<h1><a class="toc-backref" href="#toc-entry-1">Introduction</a></h1>
<p>Xapian provides support for storing a synonym dictionary, or thesaurus.  This
can be used by the Xapian::QueryParser class to expand terms in user query
strings, either automatically, or when requested by the user with an explicit
synonym operator (<tt class="docutils literal">~</tt>).</p>
<p>Note that Xapian doesn't offer automated generation of the synonym dictionary.</p>
</div>
<div class="section" id="model">
<h1><a class="toc-backref" href="#toc-entry-2">Model</a></h1>
<p>The model for the synonym dictionary is that a term or group of consecutive
terms can have one or more synonym terms.  A group of consecutive terms is
specified in the dictionary by simply joining them with a single space between
each one.</p>
</div>
<div class="section" id="queryparser-integration">
<h1><a class="toc-backref" href="#toc-entry-3">QueryParser Integration</a></h1>
<p>In order for any of the synonym features of the QueryParser to work, you must
call <tt class="docutils literal"><span class="pre">QueryParser::set_database()</span></tt> to specify the database to use.</p>
<p>If <tt class="docutils literal">FLAG_SYNONYM</tt> is passed to <tt class="docutils literal"><span class="pre">QueryParser::parse_query()</span></tt> then the
QueryParser will recognise <tt class="docutils literal">~</tt> in front of a term as indicating a request for
synonym expansion.  If <tt class="docutils literal">FLAG_LOVEHATE</tt> is also specified, you can use <tt class="docutils literal">+</tt>
and <tt class="docutils literal">-</tt> before the <tt class="docutils literal">~</tt> to indicate that you love or hate the synonym
expanded expression.</p>
<p>A synonym-expanded term becomes the term itself OR-ed with any listed synonyms,
so <tt class="docutils literal">~truck</tt> might expand to <tt class="docutils literal">truck OR lorry OR van</tt>.  A group of terms is
handled in much the same way.</p>
<p>If a term to be synonym expanded will be stemmed by the QueryParser, then
synonyms will be checked for the unstemmed form first, and then for the stemmed
form, so you can provide different synonyms for particular unstemmed forms
if you want to.</p>
<p>If <tt class="docutils literal">FLAG_AUTO_SYNONYMS</tt> is passed to <tt class="docutils literal"><span class="pre">QueryParser::parse_query()</span></tt> then the
QueryParser will automatically expand any term which has synonyms, unless the
term is in a phrase or similar.</p>
<p>If <tt class="docutils literal">FLAG_AUTO_MULTIWORD_SYNONYMS</tt> is passed to <tt class="docutils literal"><span class="pre">QueryParser::parse_query()</span></tt>
then the QueryParser will look at groups of terms separated only by whitespace
and try to expand them as term groups.  This is done in a &quot;greedy&quot; fashion, so
the first term which can start a group is expanded first, and the longest group
starting with that term is expanded.  After expansion, the QueryParser will
look for further possible expansions starting with the term after the last
term in the expanded group.</p>
</div>
<div class="section" id="current-limitations">
<h1><a class="toc-backref" href="#toc-entry-4">Current Limitations</a></h1>
<div class="section" id="explicit-multi-word-synonyms">
<h2><a class="toc-backref" href="#toc-entry-5">Explicit multi-word synonyms</a></h2>
<p>There ought to be a way to explicitly request expansion of multi-term synonyms,
probably with the syntax <tt class="docutils literal">~&quot;stock market&quot;</tt>.  This hasn't been implemented
yet though.</p>
</div>
<div class="section" id="backend-support">
<h2><a class="toc-backref" href="#toc-entry-6">Backend Support</a></h2>
<p>Currently synonyms are supported by glass and chert databases.  They work
with a single database or multiple databases (use Database::add_database() as
usual).  We've no plans to support them for the InMemory backend, but we do
intend to support them for the remote backend in the future.</p>
</div>
</div>
</div>
</body>
</html>
